91 research outputs found
Formal Context Generation using Dirichlet Distributions
We suggest an improved way to randomly generate formal contexts based on
Dirichlet distributions. For this purpose we investigate the predominant way to
generate formal contexts, a coin-tossing model, recapitulate some of its
shortcomings and examine its stochastic model. Building up on this we propose
our Dirichlet model and develop an algorithm employing this idea. By comparing
our generation model to a coin-tossing model we show that our approach is a
significant improvement with respect to the variety of contexts generated.
Finally, we outline a possible application in null model generation for formal
contexts.Comment: 16 pages, 7 figure
On the Usability of Probably Approximately Correct Implication Bases
We revisit the notion of probably approximately correct implication bases
from the literature and present a first formulation in the language of formal
concept analysis, with the goal to investigate whether such bases represent a
suitable substitute for exact implication bases in practical use-cases. To this
end, we quantitatively examine the behavior of probably approximately correct
implication bases on artificial and real-world data sets and compare their
precision and recall with respect to their corresponding exact implication
bases. Using a small example, we also provide qualitative insight that
implications from probably approximately correct bases can still represent
meaningful knowledge from a given data set.Comment: 17 pages, 8 figures; typos added, corrected x-label on graph
An Incremental Learning Method to Support the Annotation of Workflows with Data-to-Data Relations
Workflow formalisations are often focused on the representation of a process with the primary objective to support execution. However, there are scenarios where what needs to be represented is the effect of the process on the data artefacts involved, for example when reasoning over the corresponding data policies. This can be achieved by annotating the workflow with the semantic relations that occur between these data artefacts. However, manually producing such annotations is difficult and time consuming. In this paper we introduce a method based on recommendations to support users in this task. Our approach is centred on an incremental rule association mining technique that allows to compensate the cold start problem due to the lack of a training set of annotated workflows. We discuss the implementation of a tool relying on this approach and how its application on an existing repository of workflows effectively enable the generation of such annotations
Fast Generation of Best Interval Patterns for Nonmonotonic Constraints
International audienceIn pattern mining, the main challenge is the exponential explosion of the set of patterns. Typically, to solve this problem, a constraint for pattern selection is introduced. One of the first constraints proposed in pattern mining is support (frequency) of a pattern in a dataset. Frequency is an anti-monotonic function, i.e., given an infrequent pattern, all its superpatterns are not frequent. However, many other constraints for pattern selection are neither monotonic nor anti-monotonic, which makes it difficult to generate patterns satisfying these constraints.In this paper we introduce the notion of "generalized monotonicity" and Sofia algorithm that allow generating best patterns in polynomial time for some nonmonotonic constraints modulo constraint computation and pattern extension operations. In particular, this algorithm is polynomial for data on itemsets and interval tuples. In this paper we consider stability and delta-measure which are nonmonotonic constraints and apply them to interval tuple datasets. In the experiments, we compute best interval tuple patterns w.r.t. these measures and show the advantage of our approach over postfiltering approaches
Some Programming Optimizations for Computing Formal Concepts
This paper describes in detail some optimization approaches
taken to improve the efficiency of computing formal concepts. In particular, it describes the use and manipulation of bit-arrays to represent FCA
structures and carry out the typical operations undertaken in computing
formal concepts, thus providing data structures that are both memoryefficient and time saving. The paper also examines the issues and compromises involved in computing and storing formal concepts, describing
a number of data structures that illustrate the classical trade-off between
memory footprint and code efficiency. Given that there has been limited
publication of these programmatical aspects, these optimizations will be
useful to programmers in this area and also to any programmers interested in optimizing software that implements Boolean data structures.
The optimizations are shown to significantly increase performance by
comparing an unoptimized implementation with the optimized one
Making Use of Empty Intersections to Improve the Performance of CbO-Type Algorithms
This paper describes how improvements in the performance of Close-by-One type algorithms can be achieved by making use of empty intersections in the computation of formal concepts. During the computation, if the intersection between the current concept extent and the next attribute-extent is empty, this fact can be simply inherited by subsequent children of the current concept. Thus subsequent intersections with the same attribute-extent can be skipped. Because these intersections require the testing of each object in the current extent, significant time savings can be made by avoiding them. The paper also shows how further time savings can be made by forgoing the traditional canonicity test for new extents, if the intersection is empty. Finally, the paper describes how, because of typical optimizations made in the implementation of CbO-type algorithms, even more time can be saved by amalgamating inherited attributes with inherited empty intersections into a single, simple test
On Coupling FCA and MDL in Pattern Mining
International audiencePattern Mining is a well-studied field in Data Mining and Machine Learning. The modern methods are based on dynamically updating models, among which MDL-based ones ensure high-quality pattern sets. Formal concepts also characterize patterns in a condensed form. In this paper we study MDL-based algorithm called Krimp in FCA settings and propose a modified version that benefits from FCA and relies on probabilistic assumptions that underlie MDL. We provide an experimental proof that the proposed approach improves quality of pattern sets generated by Krimp
Creation and evolution of magnetic helicity
Projecting a non-Abelian SU(2) vacuum gauge field - a pure gauge constructed
from the group element U - onto a fixed (electromagnetic) direction in isospace
gives rise to a nontrivial magnetic field, with nonvanishing magnetic helicity,
which coincides with the winding number of U. Although the helicity is not
conserved under Maxwell (vacuum) evolution, it retains one-half its initial
value at infinite time.Comment: Clarifying remarks and references added; 12 pages, 1 figure using
BoxedEPSF, REVTeX macros; submitted to Phys Rev D; email to
[email protected]
Revisiting Pattern Structure Projections
International audienceFormal concept analysis (FCA) is a well-founded method for data analysis and has many applications in data mining. Pattern structures is an extension of FCA for dealing with complex data such as sequences or graphs. However the computational complexity of computing with pattern structures is high and projections of pattern structures were introduced for simplifying computation. In this paper we introduce o-projections of pattern structures, a generalization of projections which defines a wider class of projections preserving the properties of the original approach. Moreover, we show that o-projections form a semilattice and we discuss the correspondence between o-projections and the representation contexts of o-projected pattern structures
DEvIANT: Discovering Significant Exceptional (Dis-)Agreement Within Groups
We strive to find contexts (i.e., subgroups of entities) under which exceptional (dis-)agreement occurs among a group of individuals , in any type of data featuring individuals (e.g., parliamentarians , customers) performing observable actions (e.g., votes, ratings) on entities (e.g., legislative procedures, movies). To this end, we introduce the problem of discovering statistically significant exceptional contextual intra-group agreement patterns. To handle the sparsity inherent to voting and rating data, we use Krippendorff's Alpha measure for assessing the agreement among individuals. We devise a branch-and-bound algorithm , named DEvIANT, to discover such patterns. DEvIANT exploits both closure operators and tight optimistic estimates. We derive analytic approximations for the confidence intervals (CIs) associated with patterns for a computationally efficient significance assessment. We prove that these approximate CIs are nested along specialization of patterns. This allows to incorporate pruning properties in DEvIANT to quickly discard non-significant patterns. Empirical study on several datasets demonstrates the efficiency and the usefulness of DEvIANT. Technical Report Associated with the ECML/PKDD 2019 Paper entitled: "DEvIANT: Discovering Significant Exceptional (Dis-)Agreement Within Groups"
- …